Mining Community Structure of Named Entities from Web Pages and Blogs
نویسندگان
چکیده
Although community discovery based on social network has been studied extensively in the Web hyperlink environment, limited research has been done in the case of Web documents. The co-occurrence of Words and entities in sentences and documents usually implies some connections among them. Studying such connections may reveal important relationships. In this paper, we investigate the cooccurrences of named entities in Web pages and blogs, and mine communities among those entities. We show that identifying communities in such an environment can be transformed into a graph clustering problem. A hierarchical clustering algorithm is then proposed, which exploits triangle structures within the graph and the mutual information between vertices. Our empirical study shows that the proposed algorithm is promising in discovering communities from Web documents.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA short walk in the Blogistan
The increasingly prominent new subset of Web pages, called ‘blogs’ differs from traditional Web pages both in characteristics and potential to applications. We explore three aspects of the blogistan: its overall scope and size, identification of emerging hot topics of discussion and link patterns, and implications both to blogs and applications such as search. Beyond blogs, we develop a general...
متن کاملExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کاملPervasive Web Community Structure Summarization: A Machine Learning Approach
Although community discovery based on social network has been studied extensively in the Web hyperlink environment, limited research has been done in the case of Web documents. The co-occurrence of Words and entities in sentences and documents usually implies some connections among them. Studying such connections may reveal important relationships. In this paper, we investigate the co-occurrenc...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006